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Abstract 

Based on a-stable random projections with small a, we develop a simple algorithm for com¬ 
pressed sensing (sparse signal recovery) by utilizing only the signs (i.e., 1-bit) of the measure¬ 
ments. Using only 1-bit information of the measurements results in substantial cost reduction 
in collection, storage, communication, and decoding for compressed sensing. The proposed algo¬ 
rithm is efficient in that the decoding procedure requires only one scan of the coordinates. Our 
analysis can precisely show that, for a A-sparse signal of length N, 12.3KlogN/S measurements 
(where S is the confidence) would be sufficient for recovering the support and the signs of the 
signal. While the method is very robust against typical measurement noises, we also provide 
the analysis of the scheme under random flipping of the signs of the measurements. 

Compared to the well-known work on 1-bit marginal regression (which can also be viewed as 
a one-scan method), the proposed algorithm requires orders of magnitude fewer measurements. 
Compared to 1-bit Iterative Hard Thresholding (IHT) (which is not a one-scan algorithm), our 
method is still significantly more accurate. Furthermore, the proposed method is reasonably 
robust against random sign flipping while IHT is known to be very sensitive to this type of noise. 


1 Introduction 

Compressed sensing (CS) [7] [2] is a popular and important topic in mathematics and engineering, 
for recovering sparse signals from linear measurements. Here, we consider a A'-sparse signal of 
length N, denoted by Xi, i = 1 to N. In our scheme, the linear measurements are collected as 
follows 


N 

Dj = y j = 1, 2,..., M, where Sij ~ S(a, 1) 

i =1 

where yj 's are the measurements and Sij is the (i,j)-th entry of the design matrix sampled i.i.d. 
from an a-stable distribution with unit scale, denoted by S(a, 1). This is different from classical 
framework of compressed sensing. Classical algorithms of compressed sensing use Gaussian design 
(i.e., a = 2 in the family of stable distribution) or Gaussian-like design (e.g., a distribution with 
finite variance), to recover signals via computationally intensive methods such as linear program¬ 
ming [5] or greedy methods such as orthogonal matching pursuit (OMP) [m!3E 18 : , i23j . 
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The recent work m studied the use of a-stable random projections with a < 2, for accu¬ 
rate one-scan compressed sensing. Basically, if Z ~ S(a, 1), then its characteristic function is 
E = e~^ a , where 0 < a < 2. Thus, both Gaussian (a = 2) and Cauchy (a = 1) distribu¬ 

tions are special instances of the a-stable distribution family. Inspired by US], we develop one scan 
1-bit compressed sensing by using small a (e.g., a = 0.05) and only the sign information (i.e., 
sgn(yj)) of the measurements. Compared to alternatives, the proposed method is fast and accurate. 

The problem of 1-bit compressed sensing has been studied in the literature of statistics, infor¬ 
mation theory and machine learning, e.g., m eh 012 a 0ii22]. l-bit compressed sensing has many 
advantages. When the measurements are collected, the hardware will anyway have to quantize the 
measurements. Also, using only the signs will potentially reduce the cost of storage and transmis¬ 
sion (if the number of measurements does not have to increase too much). It appears, however, 
that the current methods for 1-bit compressed sensing have not fully accomplished those goals. For 
example, mi showed that even with M/N = 2 (i.e., the number of measurements is twice as the 
length of signal), there are still noticeable recovery errors in their experiments. A recent work |Tj 
also reported that even when the number of measurements exceeds length of the signal, the errors 
are still observable. 

In the experimental study in Section [6] our comparisons with 1-bit marginal regression [201 [22] 
illustrate that the proposed method needs orders of magnitude fewer measurements. Compared 
to 1-bit Iterative Hard Thresholding (IHT) [11], our algorithm is still significantly more accurate. 
Furthermore, while our method is reasonably robust against random sign flipping, IHT is known 
to be very sensitive to that kind of noise. 

A distinct advantage of our proposed method is that, largely due to the one-scan nature, we 
can very precisely analyze the algorithm with or without random flipping noise; we also provide the 
precise constants of the bounds. For example, even for a conservative version of our algorithm, the 
required number of measurements, with probability >1 — 5, would be no more than 12. 3K log N/S 
(and the practical performance is even better). Here 5 (e.g., 0.05) is the notation for confidence. 

The method of Gaussian (i.e., a = 2) random projections has become extremely popular in 
machine learning and information theory (e.g., ED- The use of a-stable random projections was 
previously studied in the context of estimating the l a norms (e.g., \ x i\ a ) °f data streams, in 

the theory literature B1E2I as well as in machine learning venue m- Consequently, our 1-bit CS 
algorithm also inherits the advantage when the data (signals) arrive in a streaming fashion [17] . 

The recent work m used a-stable projections with very small a to recover sparse signals, with 
many significant advantages: (i) the algorithm needs only one scan; (ii) the method is extremely 
robust against measurement noises (due to the heavy-tailed nature of the projections); and (iii) 
the recovery procedure is per coordinate in that even when there are no sufficient measurements, a 
significant portion of the nonzero coordinates can still be recovered. The major disadvantage of [T5] 
is that, since the measurements are also heavy-tailed, the required storage for the measurements 
might be substantial. Our proposed 1-bit algorithm provides one practical (and simple) solution. 

2 The Proposed Algorithm 

In our algorithm, the entries (i.e., Sy) of the design matrix are sampled from i.i.d. a-stable with unit 
scale, denoted by S(a, 1). We can follow the classical procedure to generate samples [3] from S(a, 1). 
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That is, we first sample independent exponential w ~ exp{ 1) and uniform u ~ unif(— 7t/ 2, vr/2) 
variables, then 


g{u,w;a) 


sin(cm) 

(cosu) 1 /" 


cos(u — era) 


w 


( 1 —a)/a 


S(a, 1) 


( 1 ) 


There are two excellent books on stable distributions El eh. Basically, if Z ~ S(a , 1), then its 
characteristic function is E = e~^ a . However, closed-form expressions of the density 

exists only for a = 2 (i.e., Gaussian), a = 1 (i.e., Cauchy), or a = 0+. 


Alg. H summarizes our one-scan algorithm for recovering the signs of sparse signals. 


Algorithm 1 Stable measurement collection and the one scan 1-bit algorithm for sign recovery. 
Input: IL-sparse signal x € M lxAr , design matrix S £ M JVxM with entries sampled from S(a, 1) with 
small a (e.g., a = 0.05). To generate the th entry Sij, we sample Uij ~ uniform^—ir/2, tt/2) 
and Wij ~ exp( 1) and compute = g(uij,Wij;a ) by (fT|). 

Collect: Linear measurements: yj = j = 1 to M. 

Compute: For each coordinate i = 1 to N, compute 

M 

Qf = X] lo S ( 1 + s 9n(.yj)sgn(u ij )e~ ( ' K ~ i ' )Wi *') , 

t=i 

M 

Q~ = 5Z log ( X “ s 9n{yj)sgn{uij)e^ K ~ 1)wi ^ 

l=i 

( +1 if Qf > 0 

Output: For i = 1 to N , report the estimated sign: sgn(xi) = < — 1 if Q~ > 0 

\ 0 if Qf < 0 and Q~ < 0 


The central component of the algorithm is to compute Qf and Q t , for i = 1 to N, where 

M 

Qf = ^2 iog (i + sgn(y j )sgn{u ij )e^ K ~ V>Wi ^ (2) 

1=1 
M 

QT = log ( X “ sgn(y j )sgn{u ij )e ~ < ' K ~ 1)wi ^') (3) 

i=i 

Later we will explain that it makes no essential difference if we replace sgnfuij ) with sgn(sij) and 
Wij with 1 /1 s tJ |“. The parameter a should be reasonably small, e.g., a = 0.05. In many prior 
studies of compressed sensing, K is often assumed to be known. Very interestingly, even if K is 
unknown, it can still be reliably estimated in our framework using only a very small number (e.g., 
5) of measurements, as validated in Sec. 16.41 

To make the theoretical analysis easier, Alg. [I] uses “0” as the threshold for estimating the sign: 

r +1 if Qf > 0 
sgn(xi) = < -1 if Q~ > 0 

\ 0 if Qf < 0 and Q~ < 0 


3 


(4) 









Later in the paper, Lemma Q] will show that at most one of Qf and Q~ can be positive. Using 0 as 
the threshold simplifies the analysis. As will be shown in our experiments, a more practical version 
of the algorithm will reduce the number of measurements predicted by the analysis. 

Note that, unless the signal is ternary (i.e., Xi £ {—1,0,1}), we will need another procedure 
for estimating the values of the nonzero entries. A simple strategy is to do a least square on the 
reported coordinates, by collecting K additional measurements. 

Next, we will present the intuition and theory for the proposed algorithm. 


3 Intuition 


Our proposed algorithm, through the use of Q+ and Q i , is based on the joint likelihood of 
(sgn(yj), Sij). Denote the density function of S(a, 1) by fs(s). Recall 

N 

Uj — } ' x t s t j — XiSij + ^ ) x^Stj — X{Sij + OiSj (5) 

t= 1 t^i 

where Sj ~ S(a, 1) is independent of Sij and 0i = (^2 t ^ • Using a conditional probability 

argument, the joint density of ( Uj,Sij ) can be shown to be j-fs(sij)fs ( }— J ■ Now, suppose 

we only use (store) the sign information of ijj . We have 

Pr (y 3 > 0 , s^) = J j fs(sij)fs ( j dy 

=M ^. Fs (z™i)) 

where Fs is the cumulative distribution function (cdf) of S(a, 1). Similarly, 

Pr (■yj < 0 , s^) = J jfs{sij)fs dy 

which means the joint log-likelihood is proportional to l(xi,9i) = YlfL i log Fs (s9 n (yj)- 2 gr L '^ • 

Since our algorithm uses small a, we can take advantage of the limit density at a = 0+. Sup¬ 
pose u ~ uniform^— 7r/2,7r/2) and w ~ exp( 1). Using (JTJ) , we can express Z = g(u,w;a) ~ 
sgn^/w 1 ' 01 . In other words, in the limit a —> 0+, \/\Z\ a ~ exp( 1). This fact was originally 
established by [ 6 ] and was used by m to derive the harmonic mean estimator (1161) of K. 


Therefore, as a —> 0+, we can write the cdf as F$(s) = \ + sgn(s)^e I s ! , which leads to 

M , 

l(xi, 0i) = log ( 1 + sgn(sijXiyj) exp 


3 = 1 


'2 

6, a 


XjS 


l°l] 
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Clearly, if Xi = 0, then l(xi, 9i ) = 0. This is the reason why it is convenient to use 0 as the threshold. 
We can then use the following Qf and Q~ to determine if Xi > 0 or Xi < 0: 


m ' / / k — l \ 

Qt = log + s 9 n { s ijUj) exp pr J 


3 = 

M 


As 


a 


Qi = log ( 1 “ s 9 <s ijVo ) exp ( - 
i=i ' 

0+, we have 9f = K — 1 (if Xi ^ 0) or K (if Xi = 0). Also note that \xi\ a = 0 (if Xi = 0) 


K - 1 


or 1 (if Xi ^ 0). Because sgn(sij) = sgn(uij ) and s | a becomes Wij , we can write them as 


M 

Qf = E log (* + s 9 n (yj) s 9'n(u ij )e~^ K ~ 1)wi ^ , 

3 = 1 
M 

Q~ = J>g (l - sgn(y j )sgn(u ij )e~ ( - K ~ 1 ^ Wi ^ 

3 = 1 


This is the reason why we compute Qf and Q t as in (J2J) and Q, respectively. 


So far, we have explained the idea behind our proposed Alg. [TJ Next we will conduct further 
theoretical analysis for the error probabilities and consequently the sample complexity bound. 


4 Analysis 


Our analysis will repeatedly use the fact that sgn(sijyj) = sgn(yj/sij ) = sgn(xi + 9iSj / Sij), where 

f \ l/a 

Sj ~ S(a, 1) is independent of s^j and 9i = yYlt^i\ x t\ a ) ■ Note that both and yj are 
symmetric random variables. 

Our first lemma says that at most one of Qt and Qf, respectively defined in ((5J) and ([3]), can 
be positive. 


Lemma 1 If Qt > 0 then Q t < 0 . If Q i >0 then Qf < 0. 

Proof: It is more convenient to examine e^* and e®* and compare them with 1. Let Zj = 

e -(K-i)wij _ N 0 t e that. 0 < Zj < 1. Now suppose e Qt > 1. We divide the coordinates, j = 1 to M, 
into two disjoint sets I and II, such that 

e®t = JJ|l + z,-| \1-Zj\ > 1 

jei jell 


Because yA - > 1 + Zj and ypr- > 1 — Zj, we must have 


n 

jei 

which means we must have 


1 

TT 

1 

1 - z j 

11 

jell 

1 + Zj 


> ni 1 + 2 il II \ l ~ z o\ > 1 

jei jell 


eQi = 


nn-^i n \ i+z i\ < 1 

3 ei jell 


This completes the proof. 


□ 
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Although Lemma[T]suggests that it is convenient to use 0 as the threshold, we provide more gen¬ 
eral error probability tail bounds by comparing Q/~ and Q~ with eM/K , where e is not necessarily 
nonnegative. The following intuition might be helpful to see why M/K is the right scale: 



M 

= ^ log (1 + sgn{yj/sij ) exp (—(AT - l)-(%)) 
3 = 1 
M 

< 1 log (1 + sgn(yj/sij ) exp (-(K - l)u^))| 

3 = 1 
M 

~£ exp (-(* -1 )wij) 

3 = 1 


By the moment generating function of exponential distribution, we know that 


E 


^XJ ex p(-(A'- 


E Eexp (-(it - l)w l3 ) = (1 + K _ x) 


M 

~K 


Lemma [2] concerns the error probability (i.e., the false positive) when Xi = 0 and eM/K is used 
as the threshold. 


Lemma 2 For any e and any t > 0, we have 


where 


Pr (Qf > eM/K, Xi = 0) = Pr {Q i > eM/K, Xi = 0) < exp j - ~^Hi (t ; e, K) | 


H l{t] e, K) =et - K log (1 + + + ... 


V (2 K - 1)2! 


oo 1 n— 1 , 

-et — K log I 1 + - TT -- - 

niL — n + 1 n — l 

n= 2,4,6,... Z=0 


(4 K - 3)4! 

n— 1 

tII 


In the limit as K —> oo, we have 

H\(t] e, oo) =et — 


t(t-l) t(t-l)(t-2)(t-3) 
2x2! 4x4! 

°° 


e 


, n n — l 
n=2,4,6,... /=0 


Proof: See Avvendix C4l 


( 6 ) 

(7) 

( 8 ) 

□ 


To minimize the error probability in Lemma El we need to seek the optimum (maximum) values 
of Hi for given e and K. Figure [1] plots the optimum values t = t\ as well as the optimum values 
of H 3 for K = 5 to 100. As expected, these optimum values are insensitive to K (in fact, no 
essential difference from the limiting case of K —> oo). At e = 0, the value of 1 /H* is about 12.2. 
Note that to control the error probability to be < 5, the required number of measurements will be 
M > jjt- log N/5. Thus we use a numerical number 12.3 for the bound of the sample complexity. 
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Figure 1: For Lemma[2l we plot the optimum t = t\ values (left panel) which maximizes Hi(t; e, K), as well 
as the optimum values Hi = H j* at t = t* (right panel), for K = 5 to 100. The different curves essentially 
overlap. At the threshold e = 0, the value l/H( is about 12.2 (and smaller than 12.3). 

Next, Lemma [3] concerns the false negative error probability when x % ^ 0. 


where 


A=l+ £ 


and 


\ a —> 0, 

we have 





Pr {Qf 

< eM / K , Xi 

< 0) < exp ^ 

M tt , 
-—H 2 (t-e : 

,*o) 

(9) 

t;e,K) = 

= —et — K x 

log [A] 



(10) 

n-l 

TT 1 

j OO 

-l v 

1 

n— 1 

TT 

t-l 


■ 1 11 n 
1=0 

-l ^ 

n=l,3,5. 

(n + 1)(K - 

-1) + 1 
’ 1=0 

n — l 


OO 1 

^ t-l 

OO ^ 

n ^t-l 




z ' n n — 
n=2,4,6... 1=0 


l ' (n + 1) n — l 

n=l,3,5... V '1=0 


( 11 ) 


Proof: See AvvendixWl 


□ 


Figure [2] plots the optimum t* 2 values which maximize H 2 , together with the optimum H 2 values. 
Interestingly, when e = 0, the value of 1 /H 2 is also about 12.2 (smaller than 12.3). This is not 
surprising, because, for both Hi(t;e,oo) and 00 ), the leading term at e = 0 is 1 . 


Sample Complexity. Given K, N. e, 6, the required number measurements can be computed 
from 

(N — K) x Pr (Qf > eM/I <, x, = 0)+AxPr (Q+ < eM/I <, Xi > 0) < -5 

When e = 0, because the constants of both error probabilities are upper bounded by 12.3, we obtain 
a convenient expression of complexity, which we present as Theorem [TJ 

Theorem 1 Using Alg. in order for the total error (for estimating the signs) of all the coordi¬ 
nates to he bounded by some 5 > 0, it suffices to use M = [12.3/1 log N/S~\ measurements. 
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Figure 2: For Lemma[3l we plot the optimum t = t?, values (left panel) which maximizes ^(L e, K), as well 
as the optimum values H 2 = TTf at t = t% (right panel), for K = 5 to 100. The different curves essentially 
overlap. At e = 0, the value of 1/H% is again about 12.2 (which is smaller than 12.3). 


5 Recovery Under Noise 

We can add measurement noises: yj = YliLi x i s ij + n F where typically rij ~ IV(0, a 2 ) at some 
noise level a. The framework of sparse recovery using cc-stable random projections with small a 
is extremely (or boringly) robust against this type of measurement noises [15]. To make the study 
more interesting, we consider another common noise model for 1 -bit compressed sensing by ran¬ 
domly flipping the signs of the measurements. 

That is, we introduce independent variables 77 . j = 1 to M, so that 77 = 1 with probability 1 —7 
and 0 with probability 7 . During recovery, we use ( rjyj ) to replace the original y 3 . To differentiate 
from the previous notation, we use Qf' and Q~ 7 , respectively, to replace Qf and Q~. 

Interestingly, Lemma 0] shows that random flipping does not affect the false positive probability. 


Lemma 4 For any e and any t > 0, we have 

Pr (Q+ 7 > eM/K, Xi = 0 ) = Pr (q^ > eM/K, Xi = 0 ) < exp |-^^i(t; e, W)| (12) 

where H\{t\e,K) is the same as in LemmaH 

Proof: See Appendix 171 The key is that sgn(rjUij) and sgn(uij) has the same distribution. □ 

On the other hand, as shown in the next lemma, this randomly flipping (with probability 7 ) 
does affect the false negative probability. 

Lemma 5 For any e, 0 < t < 1, and a —> 0, we have 

Pr (Q + 7 < eM/K, Xi > 0^ = Pr (q~^ < eM/K, Xi < 0) < exp 7 )^ (13) 


H^(t\ e, K, 7 ) = — et — K x log [B] 


(14) 













1 


B=l+ £ 




n(K — 1) + 1 n — l 

n= 2,4,6... v '1=0 


n—l 


oo 


£ 


1 - 27 

(re + l)(AT-l) + l 


n—l 

n 


1=0 


t-i 
re - l 


H±(t\ e, 00 , 7 ) = — et — 


00 1 n—l 

£ }n 


t-i 


n n — l 
n= 2,4,6... «=o 


00 1 O n ~ 1 4 7 

+ y 1 ~tt — 

y-L (n + 1) ^ re — l 

n=l,3,5... v ' Z=0 


(15) 


Proof: See Appendix CDl □ 

From Lemma |4] and Lemma [5] we can numerically compute the required number of measurements 
for any given N and K. We will also provide an empirical study in Sectional 


6 Experiments and Comparisons 

In this section, we provide a series of experimental studies to verify the proposed algorithm. In the 
literature, the so-called 1 -bit marginal regression [ 2 DJ [ 22 ] can be viewed as a one-scan algorithm 
and hence it is the competitor we should compare our method with. As shown in the experiments, 
however, the proposed method needs orders of magnitude fewer measurements than 1 -bit marginal 
regression. Thus, to make the empirical study more interesting, we also compare the method 
with the well-known 1-bit Iterative Hard Thresholding (IHT) [IT]. The results can show that the 
proposed algorithm is still significantly more accurate. Furthermore, our method is reasonably 
robust against random sign flipping, while IHT is known to be very sensitive to that kind of noise. 

6.1 A Practical Variant of Alg CD 

Although Alg.Q]is convenient for theoretical analysis, the practical performance can be improved by 
using a simple variant based on ranking, although the theoretical analysis would be more difficult. 

Basically, after we have computed Qf and Qf from ([2]) and ([3]), for i = 1 to N, instead of using 
0 as the threshold, we choose the top-A" coordinates ranked by max{Q^“, Qf}. Among the selected 
coordinates, if Qf > Qf (or Qf > Qi+), then we estimate sgn(xf) to be positive (or negative). 
This procedure implicitly utilizes e away from 0 and hence less conservative compared to vanilla 
Alg. CD In our experimental study, we always adopt this variant. 

6.2 Experiment Set-up 

In our experiments, we generate signals based on the two parameters N and K. We choose ( N , K ) G 
{(1000, 20), (1000, 50), (10000, 20), (10000, 50)}. For each given A and AT, we first randomly select 
K nonzero coordinates and then assign the values of the nonzero entries according to i.i.d. samples 
from N( 0, 5 2 ). We then apply our proposed variant of Alg. [Dto recover both the support and the 
signs of the signal. The number of measurements is set according to 

M = (KlogN/8 

where the confidence 5 is set to be 0.01. We vary the parameter f from 2 to 15. Note that this 
choice of M is typically a small number compared to N. Recall that, in our analysis, the required 
number of measurements using criterion dH) is proved to be 12 .3A log N/5, although the actual 
measurements needed will be smaller. 
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6.3 Sign Recovery under Random Sign Flipping Noise 

Figure E] reports the sign recovery errors JT \sgn(xi) — sgn(xj)\/K, where i is from the top -K 
reported coordinates. Note that using this definition, the maximum sign recovery error can be as 
large as 2. In each panel, we report results for 3 different 7 values (7 = 0, 0.1, and 0.2), where 7 is 
the random sign flipping probability. The curves without label (red, if color is available) correspond 
to 7 = 0 (i.e., no random sign flipping errors). 

The results in Figure [3] confirm that the proposed method works well as predicted by the 
theoretical analysis. Moreover, the method is fairly robust against random sign flipping noise. 






Figure 3: Sign recovery under random sign flipping noise. The number of measurements is chosen 
according to C,K\ogN / 8 , for ( ranging from 2 to 15. The recovery error is \sgn(xi) — sgn(xi)\/K , where 
i is from the top -K reported coordinates ranked by ma x{Qf,Q~}. Note that using this definition, the 
maximum possible sign recovery error is 2. In each panel, the 3 curves correspond to 3 different random sign 
flipping probability 7, for 7 = 0, 0.1, and 0.2, respectively. The curve without label (red, if color is available) 
is for 7 = 0 . We repeat each simulation 1000 times and report the medium. 
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6.4 Estimation of K and the Impact on Recovery Performance 

In the theoretical analysis, we have assumed that K is known, like many prior studies in compressed 
sensing. The problem becomes more interesting when K can not be assumed to be known. In our 
framework, there are two approaches to this problem. The first approach is to use a very small 
number of full measurements to estimate K. Because the task of estimating K is much easier 
than the task of recovering the signal itself, it is reasonable to expect that the required number of 
measurements will be (very) small. 


Here we use the harmonic mean estimator [12] : 


-| r (-oQ sinfQ ( M _( -7rr(-2a)sin(7ra) _ \ 
'EjL \ V [r(-a)sinfa] 2 ) 


(16) 


For small a, K is essentially Mj YlfLi l/|;y/|“ with the variance essentially being . Figured] 
provides a set of experiments to confirm that only using a very small number (such as 5) of mea¬ 
surements to estimate K leads to very accurate results, compared to using the exact values of K. 






Figure 4 : Sign recovery with estimated K by the harmonic mean estimator [12:. In each panel, the 
unlabeled curve (red if color is available) corresponds to the use of exact values of K. With merely 5 samples 
(curves labeled “5”) for estimating K, the recovery results are already close to results using exact K values. 

Another line of approach is to develop bit-estimators of K, which is an interesting and separate 
research problem, as reported in m 
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6.5 Support Recovery 

We can generalize the practical variant of Alg. [T] That is, after we rank the coordinates according 
to rna K.{Qf ,Q^}, we can choose top-/3K coordinates for j3 > 1. We have used (5 = 1 in previous 
experiments. Figure [5] reports the recall values for support recovery: 

recall = #{retrieved true nonzeros}/K 

for f3 = 1, 1.2, and 1.5. Note that in this case we just need to present the recalls, because 
precision = #{retrieved true nonzeros}/(/3K). 

As expected, using larger j5 values can reduce the required number of measurements. This 
experiment could be interesting for practitioners who care about this trade-off. 



Figure 5: Support recovery. We report top-/3A coordinates ranked by ma x{Qf ,Q~}, for /3 £ 
{1,1.2,1.5,2}. We report the recall values, i.e., #{retrieved true nonzeros}/K. As expected, using larger 
/? will reduce the required number of measurements, which is set to be C,K\ogN/5 (where S = 0.01). 
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6.6 Comparisons with 1-bit Marginal Regression 

It is helpful to provide a comparison study with other 1-bit algorithms in the literature. Unfor¬ 
tunately, most of those available 1-bit algorithms are not one-scan methods. One exception is the 
1-bit marginal regression [20, [22], which can be viewed as a one-scan algorithm. Thus, it is the 
target competitor we should compare our method with. 

Figure [6] reports the sign recovery accuracy of 1-bit marginal regression in our experimental 
setting. That is, we also choose M = (^KlogN / 5, although for this approach, we must enlarge £ 
dramatically, compared to our proposed method. We can see that even with ( = 100, the errors of 
1 -bit marginal regression are still large. 






Figure 6: Sign recovery with 1-bit marginal regression. The errors are still very larger even with 
£ = 100, i.e., M = 100K\ogN/5. Note that in each panel, the three curves correspond to three different 
random sign flipping probabilities: 7 = 0, 0.1, and 0.2, respectively. 
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6.7 Comparisons with 1-bit Iterated Hard Thresholding (IHT) 

We conclude this section by providing a comparison with the well-known 1-bit iterative hard thresh¬ 
olding (IHT) [n]. Even though 1-bit IHT is not a one-scan algorithm, we compare it with our 
method for completeness. As shown in Figure [71 the proposed algorithm is still significantly more 
accurate for sign recovery. 

Note that Figure [7] does not include results of 1-bit IHT with random sign flipping noise. As 
previously shown, the proposed method is reasonably robust against this type of noise. However, we 
observe that 1-bit IHT is so sensitive to random sign flipping that the results are not presentable 0. 






Figure 1: Sign recovery with 1-bit iterative hard thresholding (IHT). The results of 1-bit IHT 
are presented as dashed (blue, if color is available) curves. For comparison, we also plot the results of the 
proposed method (solid and red if color is available). 


7 Conclusion 

l-bit compressed sensing (CS) is an important topic because the measurements are typically quan¬ 
tized (by hardware) and using only the sign information may potentially lead to cost reduction in 
collection, transmission, storage, and retrieval. Current methods for 1-bit CS are less satisfactory 
because they require a very large number of measurements and the decoding is typically not one- 
scan. Inspired by recent method of compressed sensing with very heavy-tailed design, we develop 
an algorithm for one-scan 1-bit CS, which is provably accurate and fast, as validated by experiments. 

For sign recovery, our proposed one-scan 1-bit algorithm requires orders of magnitude fewer mea¬ 
surements compared to 1-bit marginal regression. Our method is still significantly more accurate 
than 1-bit Iterative Hard Thresholding (IHT), which is not one-scan. Moreover, unlike 1-bit IHT, 
the proposed algorithm is reasonably robust again random sign flipping noise. 

1 After consulting the author of m, we decided not to present the random sign flipping experiment for 1-bit IHT. 
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Appendix 


A Proof of Lemma [2] 

Recall 

M M 

Qt = ^2 l°g (l + sgn(y j )sgn(u ij )e~ ( - K ~ 1 ' )Wi ^ = ^ log (l + sgn (yj/sij) e ~ ( - K ~ 1)wi ^ 

3 =1 3 = 1 

where -f 2 - = Xi + . 1 tJ = Xi + 9 l 2 L - Here, Sj ~ S(a, 1) is independent of Sjj, and for convenience 

we define 9 = (^G=i l x *l Q ) ^ anc l = (9 a — |xj| Q ) 1/,Q . In particular, if Xj = 0, then 9j = 9 
and sgn{yj/Sij) = sgn(Sj/sjj). As Sj and s t j are symmetric and independent, we can replace 
sgn(Sj/sij ) by sgn(sjj) = sgn(ujj). To see this 


Pr (sgn(Sj/sij) = 1) = Pr (sgn(sij/Sj) = 1) 

=Pr (sgn(sij) = 1) Pr (Sj > 0) + Pr (sgn(sij) = —1) Pr (Sj < 0) 
11111 / 

= 22 + 22 = 2 =Pr(s9 " (5 « ) = 1) 

Thus, we have 

Pr (Q+ > eM/K, Xi = 0) 

/ M 

=Pr ( £log (1 + sgn(yj/sij)e3t p (-(K - !)?%•)) > eM/K,Xi = 0 


k j=i 

( M 

y^log (1 + sgn(Sj/sjj) exp (-(K - 1)?%)) > eM/K 

( M 

log (1 + sgn(uij) exp (—(K - 1 )wij)) > eM/K 

( M 

JJ (1 + sgn(uij) exp (-( K - 1 )wij)) > e eM/K 


<e eM / Kt E M (i _)_ S gn(uij ) exp (—(K — 1 )wjj)Y , (t > 0, Markov’s Inequality) 

=e -eM/Kt nL E {(l + e -{K-l)"»y + 

g r {(' +e ^ i) ”)' + ('= e ^)‘} 


=e -eM/Kt ( i 
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Then we need to choose the t to minimize the upper bound. Let b = K — 1, then 


J™ (l + e~ bw ^j t e~ w dw = J \i + u b Ydu 

= f 1 + u b t + u 2 b t(t — l)/2! + u 3 b t(t — 1)(6 — 2)/3! + u Ab t{t — l)(t — 2 )(t — 3)/4! + ....du 
Jo 


=1 + 


* l t(t-l)(t-2) 

6+1 (26 + 1)2! (36 + 1)3! 


/»QQ ^ /»]_ 

J (l-e~ bw ^j e~ w dw = J (1 -u b Ydu 

= f 1 — + tt 2f, i(t — l)/2! — u 3b t(t — l)(t — 2)/3! + u 4b t(t — l)(t — 2 ){t — 3)/4! + ....du 

Jo 


t | t(i-l) *(* — l)(t — 2) 
6+1 + (26 + 1)2! (36 + 1)3! 


(l - e -(K-i)v,y e -w + ^i + e -(K-i) W y e ~w dw = 2 + 2 


t(t-l) | ^ tjt — l)(t — 2)(t — 3) | ^ 


(2iL - 1)2! 


(4 K - 3)4! 


Therefore, for any t > 0, we have 

Pr (Q+ > eM/K, x t = 0) = Pr (QT > e M/K, x t = 0) 

<e -eM/Kt 


1 + *(*-!) + t(t-l)(f-2)(f-3) + 


( 2 /L - 1 ) 2 ! 

= exp { - j- (et - K log U + 


M 

= exp <! ^i(t;e,iL) 


M 


(4iL - 3)4! 
i(i - 1) i(t - l)(t - 2)(t - 3) 


( 2 LT — 1 ) 2 ! 


+ 


(4/i - 3)4! 


+ ... 


where 


Hi(t; e, K) = et — if log ^1 + 
L6i(6; e, oo) = et — (—-- 

u ’ ’ ; V 2 x 2! 


t(t-l) , i(i-l)(i-2)(i-3) 
{2K - 1)2! + (4/\ - 3)4! 

t(t - l)(t - 2)(t - 3) , ^ 

474! + -J 



Note that, by L’Hospital’s Rule, we have 


lim 

K—t oo 



*(*-D , t(t-l)(*-2)(t-3) , 

(2K—1)2\ ^ (4/C—3)4! r 


1 /iL 


= lim 

K—too 


2 «(*-!) 1 «(t-l)(t-2)(«-3) , 

(2K~—]+2! (4if-3) 2 4! 

i i m-n , gOSOEES i 

(2X-1)2! ' r (4JV—3)4! V” 


— l/K 2 


t(t - 1) t tit - 1 )(t - 2)(i - 3) 
2 x 2 ! + 4x4! 


This completes the proof. 


16 





























B Proof of Lemma \ 3 \ 


Pr (Q+ < eM/K, Xi > 0) 

( M 

^log(l + sgn(y j /s ij ) exp (-(AT - l)u^)) <eM/K,Xi >0 

/ / M \ 

=Pr I exp I —t log (1 + sgn(yj/sij) exp (—(AT — 1 )wij)) I > exp (— teM/K ), x\ > 0 1 , t > 0 

( M 

JJ (1 + sgn(yj/sij)ex p (—(AT - 1 )w ij )y t > exp (-teM/K) ,Xi > 0 

< exp (teM/K) E u ((1 + sgn(yj/sij) ex p (-(K - 1 )w ij ))~ t \Xi > 0) 

Consider, for convenience, a —>• 0 and X{ > 0. Again, we study sgn(yj / Sij) = sgn (xi + OiSj/sij), 
where Sj, s t j ~ «S'(a, 1) i.i.d. Let T i3 = sgn(yj/sij ) exp (—(K — 1)^). As a —>• 0 


l/a^ 


Tij =sgn Xi + Oisgn(Uj)sgn(uij ) 


am) \ 


=sgn Xi + sgn(Uj)sgn(uij ) ( (K - 1) 



.Wij 

Wj 

_ J sgn(xi)e~^ K ~ 1 ^ Wij if (K — \)w l3 < Wj 
sgn(uij)e-( K ~ l ' ,Wi i if (K — 1 )wij > Wj 




Thus, 


E ((1 + sgn(yj/sij)ex p (—(AT - 1)?%)) * > 0) 


rWj/(K-l) 


=E 


+-E 

2 


1 

+ 2 E 


(1+ exp (—(AT — l)it)) e U du > + -E 


(1 + exp (—(A'— l)tt)) t e U du 


Wj/(K—1) 


(1 — exp (—(K — l)tt)) t e U du 


IWj/iK- 1 ) 


(1 + exp (—(A" — l)tt)) e U du f j (1 — exp (—(AT — l)tt)) e “du 


rWj/(K-l) 


(1 + exp (—(K — l)ri)) e U du > — -A 


1 f fWj/(K-l) 


(1 — exp (—(AT — l)rt)) J e “du 


'0 


1 


z 1 (‘ + ““) f ^ ■ u T‘ ^ - 5 r l [o - ”'■)" - o +u ‘) 


dudw 


Again, for convenience, we denote b = K — 1. 
s_t du 


[\± + u b y 

J 0 

f 

Jo 


= 1 -u b t + u z \-t)(-t - l)/2! + u 6b (-t)(-t - l)(-t - 2)/3! + « 46 (-t)(-t - l)(-t - 2)(-£ - 3)/4! + ....du 

I o 

_j £ £(£ + 1) £(£ + 1)(£ + 2) £(£ + !)(£ + 2)(£ + 3) 


6 + 1 (26+1)2! (36 + 1)3! 


(46 + 1)4! 
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1 If 1 

(1 + u b )~ t du + - / (1 - u^du = 1 + 

2 Jo 


\L 

For the other term, we have 


t(t + 1) t(t + l)(f + 2 )(t + 3) 


(26 + 1 ) 2 ! 


+ 


(46 + 1)4! 


+ ... 


roo rl 

- W 

Jo J e~ 


1 

w/b 


1 -u l 


-t 


- 1 + U l 


-t 


dudw 


tu b + t(t + l)(t + 2)u 3b /3\ + t(t + 1 )(t + 2 )(t + 3 )(t + 4)n 56 /5! + . 


5b , 


dudw 


t t(t + l)(f + 2) t(t + l)(f + 2 )(f + 3)(f + 4) 


L 


6+1 

oo 


(36 + 1)3! 
t 


(56 + 1)5! 


^ e -w/b\b+i + l)(t + 2) ,- w /b, 3 b+i _|_ t(t + l)(t + 2)(t + 3)(t + 4) , _ w /bs 5 fe + i 

6 + l v (36 + 1)3! (56 + 1)5! 


t t(t + l)(f + 2) f(f + l)(t + 2)(t + 3)(f + 4) 


6+1 

t 


(36 + 1)3! 


(56+1)5! 


6 ^f(f + l)(f + 2) 6 t(t + l)(t + 2)(t + 3)(t + 4) 6 


6+126+1 ' 3!(36 + 1) 46 + 1 5!(56 + l) 

Combining the results yields 
E(( 1 + sgn(yj/sij)exp(-(K - 1 )w ij ))~ t -,x i > 0) 

= 1 - 


66 + 1 


+ ... 


t ^ t(t + 1) t(t + l)(t + 2) t(t + l)(t + 2)(t + 3) t(f + l)(t + 2)(t + 3 )(t + 4) ^ 


6 + 1 (26 + 1)2! (36 + 1)3! 


(46 + 1)4! 


(56 + 1)5! 


+ 


t 6 i(i + l)(t + 2) 6 t(t + l)(t + 2)(t + 3)(t + 4) 6 


6 + 1 26 + 1 
= 1 - 


3! (36 + 1) 46 + 1 


5!(56 + 1) 


66+1 


+ ... 


t ^ t(t + 1) t{t + l)(f + 2) t{t + l)(t + 2)(t + 3) t(t + l)(f + 2 )(f + 3)(f + 4) 


26 + 1 (26 + 1)2! (46 + 1)3! 

Therefore, we can write 


(46 + 1)4! 


(66 + 1)5! 


Pr [Qf < eM/K,Xi > 0) < exp ( ——H 2 {t\e,K) 


M 


where 

H 2 (f,e,K) = -et-Kl og 


00 1 n_1 / 4- / 00 1 n " 1 t i / 

i+ V -TT —— V -TT —— 

n(K- 1) + 1 11 n-l (n + l)(K-l) + l 11 n-l 

n= 2,4,6... ' ' /=0 n=l,3,5... ' ' 1=0 


H 2 (t ; e, oo) = -et - 


“ i; 


e 


E 


1 T-r t + l 


„ , „ n n — l (n + 1 ) - 1 - 1 n — l 

n= 2,4,6... 1=0 n=l,3,5... ' ' 1=0 


C Proof of Lemma 0] 

We introduce independent binary variables r-j. j = 1 to M, so that r 3 = 1 with probability 1 — 7 . 
Define 

M M 

Qt rf = ^2 log (1 + sgn{r j y j )sgn{u ij )e~^ K ~' i J Wi ^j = ^ log (l + sgn (D' 2 /j/ s b') 

3 =1 1=1 


dw 
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Note that sgn{rjUij ) = 1 with probability 1/2(1 — 7 ) + 1 / 2 ( 7 ) = 1/2) hence it has the same 
distribution as sgn{uij). Following the proof of Lemma[21 we can derive 


Pr (Ql 1 >eM/K,x i = 0^ 

( M 

E log (1 + sgnjrjyj/sjj ) exp (-(if - 1)?%)) > eM/K,Xi = 0 

( M 

E log (1 + sgn(rjSj/sij ) exp (-(A - l)r%)) > eM/IC 

( M 

E log (1 + sgn(rjUij ) exp (—(K — 1 )wij)) > eM/K 

/ M 

=Pr I (1 4- sgn(rjUij ) exp (-(K - !)?%)) > e eM/7;i: 


U =1 

M 


=Pr n (1 + sgn(uij) exp (-(if - 1)?%)) > 


^M/K 


0 =1 


At this point, it becomes the same as the problem in Lemma [21 hence we complete the proof. 


D Proof of Lemma [5] 

Pr (Q + 7 < eM/K, 27 > 0 ) 

_ \ 

=Pr I JJ (1 + sgn(rjyj/sij)exp (-(if - 1 )r%)) * > exp (-teM/K ), 27 > Oj 

< exp (teM/ K) E M ((1 + sgnfayj/sij) exp (-(if - l)?%)) -t ; Xi > 0) 

Consider a —>• 0. We study sgn{rjjjj/sij) = sgn(xirj + rjdiSj/sij), where Sj,Sij ~ 5(a, 1) i.i.d. 
Let T % j = sgn{rji)j/sij) exp (—(A' — l)tWy). As a —>• 0 

Tij =sgn {^XiTj + rj6isgn(Uj)sgn(uij ) 1 g-CK-i)^- 

=sgn (xifj + rjsgn(Uj)sgn{uij) ^(if - 1)^-^ e -(^-i)^- 

_ f sgn(rjXi)e~ < ' K ~ 1 ' ,Wi:i if (A" — l)iOjj < IL/ 

\ sgn(rjUij)e~( K ~ l ' >Wi:j if (if — !)-(%■ > Wj 
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Thus 


E ((1 + sgn(yj/sij)ex p (—(AT - 1 )wij)) ;xi> 0) 


f r Wj/(K- 1) 1 f rWj/iK- 1) 

= (1 — 7 )-E < / (1 + exp (—(A" — l)it)) *e U du > + 'yE < / (1 — exp (—(K — l)u)) e 


+ \e\T (1 + exp (-( K - l)u))-‘ e~ u dn + \ E j f° 

2 l dWj/(K— 1 ) I 2 


(1 — exp (—(it — l)it)) t e U du 


\{J (1 + exp (—(K - l)u)) *e “duj + ^ j ^ (1 — exp (—(if — l)u)) t e U du 






Wj/(K-1) 

(1 + exp (— (I\ — l)u)) e U du 
Wj/(K—l) 

—t —u. 


(1 — exp (—(K — l)tt)) e U du 


1 


l 


— t 


i-u 0 ) e U du- I --7 I / e 
2 ./o v 7 V 2 / io 4™+ L 


1 


oo /-l 

—w I 


-t 


1-U°) - 1 + E 


= \L ( i+u ‘) ‘ du+ 

Again, for convenience, we denote 6 = K — 1. As shown in the proof of Lemma [3j we have 

_l_ t(t + 1 )(^ + 2 )(^ + 3 ) + 


— t 


dudw 


(l + u b ) t du + - f (1 — u b ) l du = 1 + 


1 ( l ,. ^ . 1 . _ t(t + 1) t(t + l)(t + 2)(t + 3) 


>0 


i o 


(26 + 1)2! 


(46 + 1)4! 


For the other term, we have 

-t 


(* OO /*1 

„—w I 


'0 




1 -u b ) ~(l + u b 


=2 

=2 

-2 

=2 

-2 

=2 


/*oo rl 

/ e ~ W 
J 0 Je - 1 


>/b L 


— t 


tu b + f(f + l)(t + 2)u 3b /3! + i(f + 1 ){t + 2)(f + 3 ){t + 4)+ )& /5! + .. 


dudw 


5 b 


dudw 


t t(t + l)(f + 2) f(f + l)(t + 2)(f + 3)(t + 4) 


6 + 1 (36 + 1)3! 

t 


(56 + 1)5! 


L 


/ -w/b\b +1 , *(* + 1)(* + 2 ) / -w/6\36+l , + !)(* + 2 )(* + 3)(* + 4 ) / -iu/i>\56+l , 

6 +1 6 j + (36 + 1)3! 1 + (56 + 1)5! 1 J + 


t i(f + l)(f + 2) f(f + l)(f + 2)(t + 3)(f + 4) 


6 + 1 (36 + 1)3! 


(56 + 1)5! 


t 6 *(t + l)(t + 2) 6 t(t + l)(i + 2)(t + 3)(* + 4) 6 


6+126+1 3!(36 + 1) 46+1 


5! (56 + 1) 


66 + 1 


t t(t + l)(f + 2) i(f + l)(t + 2)(t + 3)(t + 4) 


26 + 1 3!(46 + 1) 

Combining the results yields 


5! (66 +1) 


dw 


E ((1 + sgn(yj/sij ) exp (-(K - 1 )?%■)) *; ^ > 0) 

1 , t(t + 1) f(f + l)(t + 2)(t + 3) 

(26 + 1)2! (46+1)4! 

_ n _ 2 1 t -i- ^ + 1 )(^ + 2 ) , + !)(^ + 2 )(^ + 3)(f + 4) 

1 7J [26+1 3!(46 +1) 5!(66 +1) 
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Therefore, we can write 


Pr ( Qt,gamma < eM/K, 


where 


H 4 {t;e,K, 7 ) 


—et — K log 


OO 

1+ E 

ra=2,4,6... 


1 

n(K — 1) + 1 


n— 1 

n 


1=0 


t -|- l 
n — l 


OO 


E 

71=1,3,5... 


1 - 27 

(n + 1)(K — 1 ) + 1 


71—1 

n 


1=0 


t -\-1 

n - l 


H 4 (t; e, 00 , 7 ) = — et 


OO 


E 

71=2,4,6... 


1 

n 


n— 1 


n 

1=0 


t ~\~ l 
n — l 


OO 


E 

n=l,3,5... 


1 - 27 
(n + 1 ) 


71—1 


n 

1=0 


t 1 

n — l 
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